智能论文笔记

Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

DeepMind Interactive Agents Team , Josh Abramson , Arun Ahuja , Arthur Brussee , Federico Carnevale , Mary Cassin , Felix Fischer , Petko Georgiev , Alex Goldin , Tim Harley

分类：机器学习

2021-12-07

来自科幻小说的普通愿景是机器人将有一天居住在我们的物理空间中，感知世界，才能协助我们的物理劳动力，并通过自然语言与我们沟通。在这里，我们研究如何使用虚拟环境的简化设计如何与人类自然交互的人工代理。我们表明，与自我监督学习的模拟世界中的人类交互的模仿学习足以产生我们称之为MIA的多模式互动剂，这成功与非对抗人类互动75％的时间。我们进一步确定了提高性能的架构和算法技术，例如分层动作选择。完全，我们的结果表明，模仿多模态，实时人类行为可以提供具有丰富的行为的富含性的令人生意的和令人惊讶的有效手段，然后可以为特定目的进行微调，从而铺设基础用于培训互动机器人或数字助理的能力。可以在https://youtu.be/zfgrif7my找到MIA的行为的视频

translated by 谷歌翻译

Data Science Approach to predict the winning Fantasy Cricket Team Dream 11 Fantasy Sports

Sachin Kumar S , Prithvi HV , C Nandini

分类：机器学习

2022-09-15

数字技术的发展和体育运动的日益普及激发了创新者，通过引入幻想体育平台FSP，将体育倾向的用户带到一个全新的不同层次上。数据科学和分析的应用在现代世界中无处不在。数据科学和分析打开门，以获得更深入的理解和帮助，以帮助决策过程。我们坚信，我们可以采用数据科学来预测FSP上的获胜幻想板球团队，Dream 11.我们建立了一个预测模型，可以预测潜在游戏中玩家的性能。我们结合了贪婪和背包算法的组合，开出了11名球员的组合，创建了一支幻想板球团队，这是最重要的统计赔率，即最大的团队成为最强的团队，从而使我们有更大的机会赢得梦想中的赌注。 11 FSP。我们使用Pycaret Python库来帮助我们理解并采用最佳回归算法来进行问题陈述，以做出精确的预测。此外，我们使用Plotly Python图书馆为我们提供了对团队的视觉见解，并且玩家通过计算前瞻性游戏的统计和主观因素来表演。交互作用图帮助我们提高了我们的预测模型的建议。您要么赢得大，赢得小巧，要么根据预期游戏中为您的幻想团队选出的球员的表现而失去赌注，而我们的模型增加了您赢得大的可能性。

translated by 谷歌翻译

Team Learning as a Lens for Designing Human-AI Co-Creative Systems

Frederic Gmeiner , Kenneth Holstein , Nikolas Martelaro

分类：人工智能

2022-07-06

生成的，ML驱动的交互式系统有可能改变人们在创作过程中与计算机互动的方式 - 将工具变成共同创建者。但是，目前尚不清楚我们如何在开放式任务域中实现有效的人类协作。在与ML驱动系统的交互中，沟通涉及一些已知的挑战。共同创造系统设计的一个被忽视的方面是如何在学习与此类系统协作时更好地支持用户。在这里，我们将人类合作的合作重新定为一个学习问题：受团队学习研究的启发，我们假设适用于人类人类团队的类似学习策略也可能会提高与共同创造生成系统一起工作的人类的协作效率和质量。在该职位论文中，我们旨在促进团队学习，作为设计更有效的共同创造人类协作的镜头，并强调协作过程质量作为共同创造系统的目标。此外，我们概述了将团队学习支持嵌入共同创造的AI系统中的初步示意图。最后，我们提出了研究议程，并提出了开放问题，以进一步研究支持人们学习与生成AI系统合作。

translated by 谷歌翻译

Teamwork under extreme uncertainty: AI for Pokemon ranks 33rd in the world

Nicholas R. Sarantinos

分类：人工智能

2022-12-27

The highest grossing media franchise of all times, with over \$90 billion in total revenue, is Pokemon. The video games belong to the class of Japanese Role Playing Games (J-RPG). Developing a powerful AI agent for these games is very hard because they present big challenges to MinMax, Monte Carlo Tree Search and statistical Machine Learning, as they are vastly different from the well explored in AI literature games. An AI agent for one of these games means significant progress in AI agents for the entire class. Further, the key principles of such work can hopefully inspire approaches to several domains that require excellent teamwork under conditions of extreme uncertainty, including managing a team of doctors, robots or employees in an ever changing environment, like a pandemic stricken region or a war-zone. In this paper we first explain the mechanics of the game and we perform a game analysis. We continue by proposing unique AI algorithms based on our understanding that the two biggest challenges in the game are keeping a balanced team and dealing with three sources of uncertainty. Later on, we describe why evaluating the performance of such agents is challenging and we present the results of our approach. Our AI agent performed significantly better than all previous attempts and peaked at the 33rd place in the world, in one of the most popular battle formats, while running on only 4 single socket servers.

translated by 谷歌翻译

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

Ronghui Mu , Wenjie Ruan , Leandro Soriano Marcolino , Gaojie Jin , Qiang Ni

分类：机器学习

2022-12-22

Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA

translated by 谷歌翻译

AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

Aowabin Rahman , Arnab Bhattacharya , Thiagarajan Ramachandran , Sayak Mukherjee , Himanshu Sharma , Ted Fujimoto , Samrat Chatterjee

分类：机器人 | 机器学习

2022-12-20

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

translated by 谷歌翻译

Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge

Nicole Robinson , Jason Williams , David Howard , Brendan Tidd , Fletcher Talbot , Brett Wood , Alex Pitt , Navinda Kottege , Dana Kulić

分类：机器人

2022-12-11

Human operators in human-robot teams are commonly perceived to be critical for mission success. To explore the direct and perceived impact of operator input on task success and team performance, 16 real-world missions (10 hrs) were conducted based on the DARPA Subterranean Challenge. These missions were to deploy a heterogeneous team of robots for a search task to locate and identify artifacts such as climbing rope, drills and mannequins representing human survivors. Two conditions were evaluated: human operators that could control the robot team with state-of-the-art autonomy (Human-Robot Team) compared to autonomous missions without human operator input (Robot-Autonomy). Human-Robot Teams were often in directed autonomy mode (70% of mission time), found more items, traversed more distance, covered more unique ground, and had a higher time between safety-related events. Human-Robot Teams were faster at finding the first artifact, but slower to respond to information from the robot team. In routine conditions, scores were comparable for artifacts, distance, and coverage. Reasons for intervention included creating waypoints to prioritise high-yield areas, and to navigate through error-prone spaces. After observing robot autonomy, operators reported increases in robot competency and trust, but that robot behaviour was not always transparent and understandable, even after high mission performance.

translated by 谷歌翻译

Scale-Invariant Specifications for \\Human-Swarm Systems

Joel Meyer , Ahalya Prabhakar , Allison Pinosky , Ian Abraham , Annalisa Taylor , Millicent Schlafly , Katarina Popovic , Giovani Diniz , Brendan Teich , Borislava Simidchieva

分类：机器人

2022-12-06

We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across the network, for implementation. In the DARPA OFFSET program field setting, we test this interface design for the operator using the STOMP interface -- the same interface used by Raytheon BBN throughout the duration of the OFFSET program. In these tests, we demonstrate that our approach is scale-invariant -- the user specification does not depend on the number of agents; it is persistent -- the specification remains active until the user specifies a new command; and it is real-time -- the user can interact with and interrupt the swarm at any time. Moreover, we show that the spectral/ergodic specification of swarm behavior degrades gracefully as the number of agents goes down, enabling the operator to maintain the same approach as agents become disabled or are added to the network. We demonstrate the scale-invariance and dynamic response of our system in a field relevant simulator on a variety of tactical scenarios with up to 50 agents. We also demonstrate the dynamic response of our system in the field with a smaller team of agents. Lastly, we make the code for our system available.

translated by 谷歌翻译

Towards a more efficient computation of individual attribute and policy contribution for post-hoc explanation of cooperative multi-agent systems using Myerson values

Giorgio Angelotti , Natalia Díaz-Rodríguez

分类：人工智能 | 机器学习

2022-12-06

A quantitative assessment of the global importance of an agent in a team is as valuable as gold for strategists, decision-makers, and sports coaches. Yet, retrieving this information is not trivial since in a cooperative task it is hard to isolate the performance of an individual from the one of the whole team. Moreover, it is not always clear the relationship between the role of an agent and his personal attributes. In this work we conceive an application of the Shapley analysis for studying the contribution of both agent policies and attributes, putting them on equal footing. Since the computational complexity is NP-hard and scales exponentially with the number of participants in a transferable utility coalitional game, we resort to exploiting a-priori knowledge about the rules of the game to constrain the relations between the participants over a graph. We hence propose a method to determine a Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent System. Assuming a simulator of the system is available, the graph structure allows to exploit dynamic programming to assess the importances in a much faster way. We test the proposed approach in a proof-of-case environment deploying both hardcoded policies and policies obtained via Deep Reinforcement Learning. The proposed paradigm is less computationally demanding than trivially computing the Shapley values and provides great insight not only into the importance of an agent in a team but also into the attributes needed to deploy the policy at its best.

translated by 谷歌翻译

Learning From Good Trajectories in Offline Multi-Agent Reinforcement Learning

Qi Tian , Kun Kuang , Furui Liu , Baoxiang Wang

分类：机器学习

2022-11-28

Offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets, which is an important step toward the deployment of multi-agent systems in real-world applications. However, in practice, each individual behavior policy that generates multi-agent joint trajectories usually has a different level of how well it performs. e.g., an agent is a random policy while other agents are medium policies. In the cooperative game with global reward, one agent learned by existing offline MARL often inherits this random policy, jeopardizing the performance of the entire team. In this paper, we investigate offline MARL with explicit consideration on the diversity of agent-wise trajectories and propose a novel framework called Shared Individual Trajectories (SIT) to address this problem. Specifically, an attention-based reward decomposition network assigns the credit to each agent through a differentiable key-value memory mechanism in an offline manner. These decomposed credits are then used to reconstruct the joint offline datasets into prioritized experience replay with individual trajectories, thereafter agents can share their good trajectories and conservatively train their policies with a graph attention network (GAT) based critic. We evaluate our method in both discrete control (i.e., StarCraft II and multi-agent particle environment) and continuous control (i.e, multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multi-agent datasets, especially when the difference of data quality between individual trajectories is large.

translated by 谷歌翻译